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^ Abstract. We develop an active set algorithm for the maximum likelihood estimation of a 

log-concave density based on complete data. Building on this fast algorithm, we indicate an EM 
algorithm to treat arbitrarily censored or binned data. 
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1 Introduction 



' A probability density / on the real line is called log-concave if it may be written as 

; f{x) = exp^(x) 

for some concave function 4> '■ K — > [— oo, oo). The class of all log-concave densities provides an 
interesting nonparametric model consisting of unimodal densities and containing many standard 
parametric families; see Diimbgen and Rufibach (2009) for a more thorough overview. 

i> : 

This paper treats algorithmic aspects of maximum likelihood estimation for this particular 



class. In Section [2] we derive a general finite-dimensional optimization problem which is closely 
related to computing the maximum likelihood estimator of a log-concave probability density / 
based on independent, identically distributed observations. Section [3] is devoted to the latter op- 
timization problem. At first we describe generally an active set algorithm, a useful tool from 
optimization theory (cf. Fletcher, 1987) with many potential applications in statistical computing. 
A key property of such algorithms is that they terminate after finitely many steps (in principle). 
Then we adapt this approach to our particular estimation problem, which yields an alternative 
to the iterative algorithms developed by Rufibach (2006, 2007) and Pal, Woodroofe and Meyer 
(2006). The resulting active set algorithm is similar in spirit to the vertex direction and support 
reduction algorithms described by Groeneboom, Jongbloed and Wellner (2008), who consider the 
special setting of mixture models. 
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In Section |4] we consider briefly the problem of estimating a probability distribution P on 
(0, oo] based on censored or binned data. Censoring occurs quite frequently in biomedical ap- 
plications, e.g. X being the time point when a person develops a certain disease or dies from a 
certain cause. Another field of application is quality control where X is the failure time of a cer- 
tain object. A good reference for event time analysis is the monograph of Klein and Moeschberger 
(1997). Binning is typical in socioeconomic surveys, e.g. when persons or households are asked 
which of several given intervals their yearly income X falls into. We discuss maximum likelihood 
estimation of P under the assumption that it is absolutely continuous on (0, oo) with log-concave 
probability density /. The resulting estimator is an alternative to those of Diimbgen et al. (2006). 
The latter authors restrict themselves to interval-censored data and considered the weaker con- 
straints of / being non-increasing or unimodal. Introducing the stronger but still natural constraint 
of log-concavity allows us to treat arbitrarily censored data, similarly as Turnbull (1976). In Sec- 
tion [5] we indicate an expectation-maximization (EM) algorithm for the estimation of P, using 
the aforementioned active set algorithm as a building block. This approach is similar to Turnbull 
(1976) and Braun et al. (2005); the latter authors considered self-consistent kernel density esti- 
mators. For more information and references on EM and related algorithms in general we refer 
to Lange et al. (2000). A detailed description of our method for censored or binned data will be 
given elsewhere. 

Section [6] contains most proofs and various auxiliary results. 

2 The general log-likelihood function for complete data 

Independent, identically distributed observations. Let Xi,X 2 , . . . , X n be independent ran- 
dom variables with log-concave probability density / = exp 4> on R. Then the normalized log- 
likelihood function is given by 

n 

l{<j>) := n- 1 ^^)- 

2=1 

It may happen that due to rounding errors one observes X{ in place of X{. In that case, let x\ < 
x 2 < ■ ■ ■ < x m be the different elements of {Xi, X 2 , . . . , X n } and define pi := n" 1 ^^ : Xj = 
xi}. Then an appropriate surrogate for the normalized log-likelihood is 

m 

i=l 
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The general log-likelihood function. In what follows we consider the functional (Q]) for ar- 
bitrary given points x\ < x 2 < • • • < x m and probability weights Pi,P2, ■ ■ ■ ,Pm > 0, i.e. 
XX=i Pi = Suppose that we want to maximize £((p) over all functions <p within a certain family 
F of measurable functions from R into [— oo, oo) satisfying the constraint f exp<^>(x) = 1. If 
F is closed under addition of constants, i.e. <fi + c G J 7 for arbitrary </> G J 7 and c£l, then one 
can easily show that maximizing £(<p) over all G J 7 with J exp dx = 1 is equivalent to 
maximizing 

m » 
L(</>) := y^pj0(xj) - / exp0(x)da; 
i=i ^ 

over the whole family J 7 ; see also Silverman (1982, Theorem 3.1). 

Restricting the set of candidate functions. The preceding considerations apply in particular 
to the family F of all concave functions. Now let Q be the set of all continuous functions tp : 

[xi,x rn ] — > R which are linear on each interval [xk, x k +i], I < k < m, and we define tp := — oo 
on R \ [x\, x m ]. Moreover, let G CO nc be the set of all concave functions within Q. For any <p G F 
with L{<p) > — oo let tp be the unique function in Q conc such that tp = <p on {^i, ^2, . . . , x m }. 
Then it follows from concavity of <p that tp < <p pointwise, and L(tp) > L(<fi). Equality holds if, 
and only if, tp = cp. Thus maximizing L over the class F is equivalent to its maximization over 

cone- 

Properties of L(-). For explicit calculations it is useful to rewrite L(tp) as follows: Any function 
tp G Q may be identified with the vector tp := (tp(xi))'^ =1 G R m . Likewise, any vector tp G R m 
defines a function tp G Q via 

tp(x) := (l~ X V>fc + X iPk+i forx e[x k ,x k+1 ],l<k<m, 

where 5k ■= Xk+i — Xk- Then one may write 

m m—l 

L(ip) = L(ip) := ^Pitpi - 8 k J(tp k ,tpk+i) 
i=l fe=i 

with 

J(r,s) := / exp((l -t)r + ts) dt 
Jo 

for arbitrary r, s G R. The latter function J : R x R — > R is infinitely often differentiable and 
strictly convex. Hence L(-) is an infinitely often differentiable and strictly concave functional on 
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\ In addition it is coercive in the sense that 



L(V>) 



-oo as — > oo. 



(2) 



This entails that both 



t/j := argmaxL(^) and 
t/j := argmaxL(^) 

V'CSconc 



(3) 
(4) 



are well defined and unique. 

Let us discuss some further properties of L(-) and its unrestricted maximizer ip. To maximize 
L(-) we need its Taylor expansion of second order. In fact, for functions rp,v 6 Q, 



d_ 
~dt 

dt 2 



m ,. 

L{ip + tv) = y^pji?(a^) — / v (x) exp ip{x) dx, 

i=l 



t=0 



L(ip + tv) 



v(x) 2 exp ip{x) dx. 



(5) 
(6) 



Note that the latter expression yields an alternative proof of L's strict concavity. Explicit formulae 
for the gradient and hessian matrix of L as a functional on W m are given in Section |6j and with 
these tools one can easily compute ip very precisely via Newton type algorithms. We end this 
section with a characterization and interesting properties of the maximizer ip. In what follows let 



Jab(r,s) :-- 



^- b J(r,s) = J (l-t) a t b exp((l-t)r + ts)dt. 



for nonnegative integers a and b. 

Theorem 2.1 Letip £ Q with corresponding density f(x) := expip(x) and distribution function 
F{r) := f(x) dx on [xi,x m ]. The function ip maximizes L if, and only if, its distribution 
function F satisfies 



F(x m ) = 1 and / F{x)dx = 2Z Pi for 1 - k < 

Jxk i= 1 



m. 



In that case, 



and 



r 

J Xl 



xf(x)dx = y~]piXj 
i=l 



/ x 2 f(x)dx = ^PiX 2 - ^ SlJ n (^k,^k+l) 
Jxi i=l k=i 
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Some auxiliary formulae. For tp G Q with density fix) := exp^(x) and distribution function 
Fir) := f(x) dx on [xi, x m ], one can easily derive explicit expressions for F and the first two 
moments of / in terms of J(-, ■) and its partial derivatives: For 1 < k < m, 

k 

F(x k+1 ) = ^SiJ^ipi+i) 

i=l 

and 

^k 1 / F(x)dx = F{x k ) + 5 k Jio(4>k, 4>k+i) e (F(x k ),F(xk+i)). 

Jx h 

Moreover, for any a£l, 

m— 1 

t 

(x - a)f(x)dx = } ) S k ((x k - 0)^0(^,^+1) + (xk+i ~ Q)Joi('0fc,'0fc+l)), 
fe=l 
m— 1 

(x - a) 2 /(x) = ^ 4((x fc - a) 2 J 10 (ip k ,ip k+1 ) + (x fe +i - a) 2 J i(V>fc, V>fc+i)) 
Xi fc=i 

m— 1 

- ^i-^i^fc^fc+i)- 
fc=i 

3 An active set algorithm 
3.1 The general principle 

We consider an arbitrary continuous and concave function L : M m — > [—00, 00) which is coercive 
in the sense of (O and continuously differentiable on the set dom(L) := {tp G M m : L(ip) > 
—00}. Our goal is to maximize L on the closed convex set 



K 



|-0 G R m : vjip < a for % = 1, . . . , g} 



where V\, . . . , v q are nonzero vectors in M m and ci, . . . , c q real numbers such that K PI dom(L) 7^ 
0. These assumptions entail that the set 

/C* := argmax L(ip) 

is a nonvoid and compact subset of dom(L). For simplicity we shall assume that 

vx,V2, ■ ■ ■ ,v q are linearly independent, (7) 

but see also the possible extensions indicated at the end of this section. 

An essential tacit assumption is that for any index set A C {1, . . . , q} and the corresponding 
affine subspace 

V{A) : = {^eR ra : vjij) = c a for all a G a\ 



of M m , we have an algorithm computing a point 

ij>(A) G V*(A) := argmax L(if>), 

provided that V(A) n dom(L) ^ 0. Now the idea is to vary A suitably until, after finitely many 
steps, tft(A) belongs to JC*. 

In what follows we attribute to any vector ip G M. m the index set 



For ip G /C the set .A(V>) identifies the "active constraints" for ip. The following theorem provides 
useful characterizations of /C* and V*(A). 

Theorem 3.1 Let 61, . . . , b m be a basis of M. m such that 

T, /< ifi=j < q, 
3 \= eise. 

(a) A vector ijj G K n dom(L) belongs to /C* if, and only if, 

tfVL<*) J =0 «"»^ {!,..., m}\*»), 
\< for alii € A{ip). 

(b) For any given set A C {l,...,g} f a vector ?/> G V(A)ndom(L) belongs toV*(A) if, and only 
if, 

bJVL(tp) = for aii i G {1, . . . , m} \ A. (9) 

The characterizations in this theorem entail that any vector i/> G /C* belongs to V*(A(i/>)). The 
active set algorithm performs one of the following two procedures alternately: 

Basic procedure 1: Replacing a feasible point with a "conditionally" optimal one. Let ip be 

an arbitrary vector in K, n dom(L). Our goal is to find a vector t^ new such that 

L(</W) > Lty) and </> ncw G /Cn V t (^ ncw )). (10) 

To this end, set A := A(ip) and define the candidate vector Vcand := ^{A). By construction, 
^(^cand) > £(V0- If L (VWd) = we set </> ncw := ^. If L(?/> cand ) > L(tp) and 

'V'cand e we set ''Anew := ''/'cand- Here (flOl) is satisfied, because A(tp ncw ) D ^(V*), so that 
V(A(^ new )) C V(A). Finally, if L(V cand ) > L(V) but ^ cand /C, let 

t = t(^,^ cand ) := max{t G (0, 1) : (1 - t)rj> + ^ cand G £} (11) 

Ci - vjip 

mm 



I^T ~ TT : 1 - { ~ ^'^V'cand > Ci}- 



Then we replace i/> with (1 — t)tf) + ttp cand . Note that L(ip) does not decrease in this step, due 
to concavity of L. Moreover, the set A(ip) increases strictly. Hence, repeating the preceding 
manipulations at most q times yields finally a vector i/> new satisfying (fTOl) . because V({1, . . . , q}) 
is clearly a subset of JC. With the new vector ip Dew we perform the second basic procedure. 

Basic procedure 2: Altering the set of active constraints. Let ip G /C n dom(L) n V* (A) with 
A = A(ip). It follows from Theorem [3TT1 that ip belongs to /C* if, and only if, 

&JVL(V>) < for all a G A. 

Now suppose that the latter condition is violated, and let a D = a Q (tp) be an index in A such 
that b a VL{ip) is maximal. Then ip + tb ao G K, and -A(V> + tb ao ) = A \ {a Q } for arbitrary 
t > 0, while + ifo ao ) > L{xj)) for sufficiently small t > 0. Thus we consider the vector 
-0 cand := ip(A \ {a }), which satisfies necessarily the inequality L(ip cand ) > L(i/>). It may fail 
to be in K, but it satisfies the inequality 

^ol^cand > c a a ■ 

For -0 can d — V> may be written as \ ao b ao + YligA ^fii with rea l coefficients Ai, . . . , A m , and 

< (^ cand -^) T VL(^) = A ao bJ o VL(^) 

according to ©. Hence < A ao = vJ o (ip cand - if>) = vJ o if> caxid - c ao . If -0 cand G /C, we repeat 
this procedure with Vcand m pl ace of ^. Otherwise, we replace ip with (1 — t)tp + tip cand , where 
i = t(t/j, i/^cand) > is defined in (fTTT) . which results in a strictly larger value of L(tp). Then we 
perform the first basic procedure. 

The complete algorithm and its validity. Often one knows a vector tp Q G /C n dom(L) in 

advance. Then the active set algorithm can be started with the first basic procedure and proceeds 
as indicated in Table [TJ In other applications it is sometimes obvious that V({1, . . . , q}), which 
is clearly a subset of K, contains a point in dom(L). In that case the input vector i/> is super- 
fluous, and the first twelve lines in Table Q] may be simplified as indicated in Table [2] The latter 
approach with starting point ip Q = ^({1, . . . , q}) may be numerically unstable, presumably when 
this starting point is very far from the optimum. In the special settings of concave least squares 
regression or log-concave density estimation, a third variant turned out to be very reliable: We 
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start with A = and tp = ip(A). As long as tp Q £ K, we replace A with the larger set A(tj> ) 
and recompute iJj = ip(A); see Table [3] 

In TableQ] the lines marked with (*) and (**) correspond to the end of the first basic procedure. 
At this stage, tp is a vector in /C PI dom(L) n V*(A(if))). Moreover, whenever the point (**) is 
reached, the value L(ip) is strictly larger than previously and equal to the maximum of L over 
the set V(A). Since there are only finitely many different sets A C {1, . . . , q}, the algorithm 
terminates after finitely many steps, and the resulting if? belongs to /C by virtue of Theorem |3.1| 

When implementing these algorithms one has to be aware of numerical inaccuracies and errors, 
in particular, if the algorithm ip(-) yields only approximations of vectors in V*(-). In our specific 
applications we avoided endless loops by replacing the conditions "6jVL(?/>) < 0" and "vjtp > 
c" with "bJVL(if}) < — e" and "vjtf) > a + e", respectively, for some small constant e > 0. 



Algorithm ip -s— ActiveSetl(L, ip(-), i/) ) 
A <- Mrj}) 

</>cand *~ i>( A ) 
whil e V'cand ^ & d0 

l/t <- (1 - t(V>, V'cand))^ + VWd^cand 

A <- A(t/>) 

end while 

A «- A(V>) (*) 

while max agj 4 bJVL(?/>) > do 

a <- min (argmax ag ^ bJVL(») 

A «- A \ {a} 

V'cand <- 

while Vcand ^ K do 

V> <- (1 - t(l/>, VWd))V> + i (V', </Wd)VWd 

A <- A(V>2 
end while 

^ <- V'cand 

A <- A(^) (**) 

end while. 

Table 1 : Pseudo-code of an active set algorithm. 



Possible extension I. The assumption of linearly independent vectors v\ , . . . , v q has been made 
for convenience and could be relaxed of course. In particular, one can extend the previous consid- 



8 



Algorithm ip «— ActiveSet2(L, -*/>(•)) 

^«-^({l,...,g}) 

A<-{l,...,q} 

while max agj 4 bJVL(tp) > do 
end while. 



Table 2: Pseudo-code of first modified active set algorithm. 



Algorithm ip <— ActiveSet3(L, 

V <- -0(0) 

while -0 /C do 

^ «- if) (A) 
end while 

A <- A(ij>) 

while max aSj 4 bJVL(tp) > do 
end while. 



Table 3: Pseudo-code of second modified active set algorithm. 

erations easily to the situation where K consists of all vectors ij) £ M m such that 

Q,i < vjij) < c ij2 
for 1 < i < q with numbers — oo < c^i < < oo. 

Possible extension II. Again we drop assumption (0 but assume that ci = • • • = c q = 0, so 
that K is a closed convex cone. Suppose further that we know a finite set 8 of generators of K,, i.e. 
every vector £ JC may be written as 

with numbers A e > 0. In that case, a point ip G /C n dom(L) belongs to /C* if, and only if, 

VL(ip) T tp = and max VL(ip) T e < 0. (12) 

Now we can modify our basic procedure 2 as follows: Let ip € K, n dom(L) n V(A) with ^4 := 
A(^). If CLUl is violated, let e(tp) € 5 such that V L(il>) T e(il>) > 0. Further let s(ip),t(ip) > 
such that t/> ncw := s(ij>)ip + t(ij))e(ip) G /C satisfies L(tp ncw ) > L(i/>). Then we replace ^ with 
i/> ncw and perform the first basic procedure. 
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3.2 The special case of fitting log-concave densities 

Going back to our original problem, note that ijj € Q lies within Q conc if, and only if, the corre- 
sponding vector if) satisfies 

~ ^ - ^ = vJV> < for j = 2, . . . , m - 1, (13) 

where = ! has exactly three nonzero components: 

v j-hj '■= Vfy-i, v j,j ■= -tfj-i + S j )/(S j - 1 Sj), v j+1J := 1/Sj. 

Note that we changed the notation slightly by numbering the m — 2 constraint vectors from 2 to 
m — 1. This is convenient, because then vjif) ^ is equivalent to the corresponding function 
tp € Q changing slope at Xj. Suitable basis vectors bi are given, for instance, by b\ := (1)™ 1; 
b m ■= (xi)iLi and 

bj = (mm(xj - xj, 0))™ v 2 < j < m. 

For this particular problem it is convenient to rephrase the active set method in terms of inactive 
constraints, i.e. true knots of functions in Q. Throughout let / = ■ ■ ■ , i(k)} be a subset of 

{1,2,..., m} with k > 2 elements 1 = < • • • < i(k) = m, and let Q{I) be the set of all 
functions ip G Q which are linear on all intervals [xif s \, 1 < s < k. This set corresponds 

to V(A) with A := {1, . . . , m} \ I. A function ip € Q(I) is uniquely determined by the vector 

{^( x i(s))) k 8=v and one ma Y write 

k k-1 
s=l s=l 

with suitable probability weights pi(I), ■ ■ ■ ,Pk(I) > 0. Precisely, writing 

${x) = X ' (S+1 |_ X 1p(Xi( s) ) + 4>(Xi( s+ l)) 

x i(s+l) x i(s) x i(s+l) x i(s) 

for 1 < s < k and x^ < x < x^ s+1 ^ yields the explicit formulae 

i(2)-l 



^1(2) ~~ x i 
i(s+l)-l 



pi (-0 = V 

X i(2)~ X l 



, T s ■ ( Xi X i(s—1) x i(s+i) x i \ r _ , , 

Ps(j) = > mm — , I pj for 2 < s < fc, 

i=i(7^)+i Va; iW- a; i(«-i) x ^+i) ~ 7 



■^i -^(fc— 1) 



Pi- 



i=i(fc-l)+l Xm Xi(fe - 1} 
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Consequently, the computation of ip or ip^ := argmax^ 6 g/n L(ip) are optimization problems of 
the same type. 

Since the vectors fc»2, • • , b m correspond to the functions A2, . . . , A m in Q with 

Aj(x) := min(x — xj, 0), (14) 

checking the inequality bJVL(ip) < for a G A amounts to checking whether the directional 
derivative 

H jW '■= ^Pi^jixi) - / Aj(x)expip(x)dx (15) 
i=i Jx * 

is nonpositive for all j € {1, . . . , m} \ I. If ip = tpW and j /, the inequality Hj(ip) > means 
that L(ip) could be increased strictly by allowing an additional knot at Xj. 

Example 3.2 Figure Q] shows the empirical distribution function of n = 25 simulated random 
variables from a Gumbel distribution, while the smooth distribution function is the estimator 
F(r) := exptjj(x) dx. Figure [2] illustrates the computation of the log-density ip itself. Each 
picture shows the current function tp together with the new candidate function V'cand- We followed 
the algorithm in Table |2j so the first (upper left) picture shows the starting point, a linear function 
ip on [x±, X25], together with tp can d having an additional knot in (xi, X25). Since V'cand is concave, 
it becomes the new function ip shown in the second (upper right) plot. In the third (lower left) 
plot one sees the situation where adding another knot resulted in a non-concave function V'cand- 
So the current function ip was replaced with a convex combination of tp and V'cand- The latter new 
function ijj and the almost identical final fit tp are depicted in the fourth (lower right) plot. 

4 Censored or binned data 

In the current and the next section we consider independent random variables X\, X2, . . . , X n 
with unknown distribution P on (0,oo] having sub-probability density / = expcj) on (0, 00), 
where <fi is concave and upper semicontinuous. In many applications the observations Xi are not 
completely available. For instance, let the Xi be event times for n individuals in a biomedical 
study, where Xi = 00 means that the event in question does not happen at all. If the study ends 
at time Cj > from the z-th unit's viewpoint, whereas Xi > a, then we have a "right-censored" 
observation and know only that Xi is contained in the interval Xi = (ci ,00]. In other settings 
one has purely "interval-censored" data: For the i-th observation one knows only which of given 
intervals (0, i^i], (^,1,^,2], • • • , (^, m («)> 00 ] contains Xi, where < U : i < ■ ■ ■ < t i)Tn u-\ < 00. If 
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Figure 1: Estimated distribution functions for n = 25 data points. 

these candidate intervals are the same for all observations, one speaks of binned data. A related 
situation are rounded observations, e.g. when we observe \X{\ rather than Xi. 

In all these settings we observe independent random intervals X\, X%, . . . , X n . More precisely, 
we assume that either X* = (Li,Ri\ 3 X{ with < Li < Ri < oo, or Xi consists only of the one 
point Li := Ri := Xi G (0, oo). The normalized log-likelihood for this model reads 



£(<!>) := n- 1 J2(l{L i = R i }cf>(X,, 



i=i 



(16) 



+ l{Li < Ri}log(^ J exp4>(x) dx + l{Ri = oo}p c 



where 



exp 4>(x) dx G [0,1]. 



5 An EM algorithm 



Maximizing the log-likelihood function £((f>) for censored data is a non-trivial task and will be 
treated in detail elsewhere. Here we only indicate how this can be achieved in principle, assuming 
for simplicity that P({oo}) = 0, i.e. J °° exp <j){x) dx = 1 and = 0. In this case, the log- 
likelihood simplifies to 

£((f>) = n-^fl^i = Ri}<KXi) + l{Li KRijlogU ' exp <f>(x) dx^j ) . 

i=i ^ ' 
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Figure 2: Estimating the log-density for n = 25 data points. 
Again one may get rid of the constraint J °° exp 4>(x) dx = 1 by considering 

/•oo 

L{ff>) := l(<f>)- / exp(f)(x)dx (17) 



o 



for arbitrary concave and upper semicontinuous functions (p '■ (0, oo) — > [— oo, oo). 

A major problem is that £((f>) is not linear but convex in <fi. Namely, for v : (0, oo) — > R and 

< L < R < oo, 



dt" 



t=Q 



^oT^^H-ftSSrAftSD (18) 



Thus we propose to maximize £(0) iteratively as follows: Starting from a function with L(4>) > 
— oo, we replace the target function L(<p new ) with 

d 



M</>ncw |0) •- ^ 



t=0 v 



;>oo 

v - 4>)) - I exp </> n ewO) 
J 

By means of (fT8T >. this may be written as 

/yoo 
4> ncw (x) P(dx\(f>) - J exp<f> new (x)dx, (19) 
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where 

P(-\<f>) := n- l ^(l{L i = B 4 }5 x . + l{L i <R i }C 4> (X\X G (^,i? i ])V 
i=i ^ ' 

a probability measure depending on the data and on <fi. In other words, for any Borel subset B of 

(0, oo), 

/ fu^fr u ^ exp 6(x) d,X\ 

P{B\<j>) := n- 1 V 1{U = Ri g B} + 1{L, < fij ^ . 

Note also that L(</> new | 0) equals the conditional expectation of the complete-data log-likelihood 
£(0new)> given the available data and assuming the current (f> to be the true log-density: 

L((pnew | <t>) = E0(L(0 new ) | Xi £ Xi for 1 < i < n), 

where the Xj are treated temporarily as fixed. 

After approximating the probability measure P(- 1 (j)) by a discrete distribution with finite sup- 
port, one can maximize L((p new \ (j>) over all concave functions 4> new with the active-set algorithm 
presented in Section [3] Then we replace 4> with <p ncw and repeat this procedure until the change of 
4> becomes negligable. 

6 Auxiliary results and proofs 

Explicit formulae for J and some of its partial derivatives. Recall the auxiliary function 
J(r, s) := f exp((l — t)r + ts) dt. One may write explicitly 



J(r, s) = J(s, r) 



(exp(r) — exp(s)) / (r — s) if r ^ s, 
exp(r) if r = s, 



or utilize the fact that J(r, s) = exp(r) J(0, s — r) with J(0, 0) = 1 and 



J(0,y) = (ex P (y)-l)/y = Et^TW" 



fc=0 v 

To compute the partial derivatives J a b(r, s) of J(r, s), one may utilize the facts that J a b{r, s) 
Jba(s, r) = exp(r) J a f,(0, s — r). Moreover, elementary calculations reveal that 

oo u 

y 



J 10 (0,y) = (ex P (y)-l- y )/y 2 = E (^2)7 ' 

oo 

J 20 (0, y) = 2(exp(y) - 1 - y - y 2 /2) /y 3 = £ 



,. :0 (* + 3)! : 



OO /I, i 1 \ fc 

Jn(0,y) = (y(exp(y) + l)-2(exp(y)-l))/y 3 = ^ L+ 2* , 

fc=o i fc + ,:i J- 
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The Taylor series may be deduced as follows: 

Jab{o,y) 



[\l-t) a t b e ty dt 
Jo 

00 k pi 

Ext / (1- * 

a\{b + k)\ 



OO JU 

E" 



fe=0 

00 

E 

fc=0 



k\ (k + a + b + 

a\(b + k)\y k 
k\(k + a + b + l)V 



according to the general formula f Q (1 — t) k t e dt = k\t\ /(k + £ + 1)! for integers fc, £ > 0. 

Numerical experiments revealed that a fourth degree Taylor approximation for J a &(0,y) is 
advisable and works very well if 

'0.005 (a = 6 = 0), 
\y\ < I 0.01 [a + b = 1), 
0.02 (a + b = 2). 

Explicit formulae for the gradient and hessian matrix of L. At if? e M m these are given by 

' 5iJ 10 (lpl,1p2) if = 1, 

£(V>) = Pk -\ 4-i^oi hPk-ii^k) + ^10(^,^+1) if2</c<m, 

^ m -i JQl{lp m -\,1pm) if /c = 771, 

SiJ2o(i>i,ih) tfj = k=l, 

Sk-iJa2(i>k-i,ipk) + 8kJ2a(ipk,ipk+l) H2<j = k<m, 

L(ip) = { 5 m -iJ 2(ipm-i,tp m ) i£j = k = m, 

djJn(^j,tpk) if 1 < 3 = k - 1 < m, 

if|j-fc|>l. 



d 



d 2 

dtp j dip k 



Proof of (0. In what follows let min(w) and max(u) denote the minimum and maximum, re- 
spectively, of all components of a vector v. Moreover let R(v) := max(i>) — min(i>). Then with 
P '■= (Pj)f=i and 8 = (^fc)fc^i 1 . note first that 

L{ip) < max(i/>) — (x m — x\) exp(mm(-i/>)) 

= R(ip) + min(i/>) — (x m — x\) exp(min(V»)) 
— > —oo as — > oo while -R("0) < 
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for any fixed r < oo. Secondly, let ipj := tpj — mm(t/)). Then min(i/>) = 0, max(^) = R(tl>), 
whence 



m ~ r x m _ 

L(ip) = J y^' j Pi'4>i + min('0) — exp(min(V>)) / exp(?/>(x)) (fx 
i=i 

< (1 — min(p)) -R(V') + supf s — exp(s) / exp(^(ar)) dx] 

= (1 — min(p)) R(ip) — log / exp(^(a;)) dx — 1 

m— 1 

= (1 - min(p)) E(V>) - log (J] 6 k J$ k ,$ k+1 )) - 1 



fc=l 



< (1 - min(p)) i?(i/0 -log(rrnn(5)J(0,i?(V>))) -1 
= (1 - min(p)) i2(V>) - log J(0, - log(e min(d)), 

where we used the fact that max s£ R(s — exp(s)A) = — log A — 1 for any A > 0. Moreover, for 

r > 0, 

-logj(0,r) = log 1 ) = -r- + log( 1 _ r g _ r ) < -r + log(l + r), 

whence 

£(V0 < — min(p)i?(i/>) + log(l + — log(emin(d)) — > — oo as —s> oo. □ 

Proof of Theorem 12.11 It follows from strict concavity of L and (f5]) that the function ip equals 
tp if, and only if, 

^ j p i v(x i ) = I v(x)f(x)dx (20) 
i=i 

for any function 

Note that any vector v € M m is a linear combination of the vectors i? W, v^ 2 \ . . . , v^ m \ where 

= (l{i < fc})^ • 

With the corresponding functions £ Q we conclude that ip maximizes L if, and only if, 

k rx m 

J^Pi = / v^{x)f(x)dx (21) 

1—1 J xi 



i=l 



for 1 < k < m. Now the vector v^ 71 ' corresponds to the constant function v^ m > := 1, so that (l2Tb 
with k = m is equivalent to F(x m ) = 1. In case of 1 < A; < m, 



^(x) := <( 



1 if x < x k , 

(x k+ i - x)/5 k if x k < x < x k+ i, 

i£x>x k+ i, 
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and it follows from Fubini's theorem that 



x m rx m rl 

v {k \x)f(x)dx = / / \{u < v {k) (x)}duf(x)dx 



xi J xi JO 

1 rx 



ri rxm 

= / / l{x < x k+ \ — u5 k }f(x) dx du 
Jo Jx 1 

= / F(x k+ i - u5 k ) du 
Jo 

= 5 k l / F(r)dr. 

Jx k 

These considerations yield the characterization of the maximizer of L. 

As for the first and second moments, equation d20l ) with v(x) := x yields the assertion that 

YliLi Vi x i equals JJ™ xf(x) dx. Finally, let v := (x?)" =1 and v € Q the corresponding piecewise 

linear function. Then 

m rx m rx m 

' s ^p i xj — \ x 2 f(x)dx = / (v{x) — x 2 )f{x) dx 

J xi J xi 

rx h+1 

22 / (x - x k )(x k+1 - x)f(x)dx 

1. 1 J Xh 



k=l JXk 
m—1 



^2 ^ll(^*,^fc+l)- 



□ 
k=l 

Proof of Theorem 13.11 It is well known from convex analysis that ij) G JC D dom(L) belongs to 
/C* if, and only if, v T VL(ip) < for any vector v € M m such that ip + tv € K, for some t > 0. 
By the special form of /C, the latter condition on v is equivalent to vjv > for all a G A(xj)). In 
other words, v = Y^ILi ^i^i with A a > for all a £ -A(V')- Thus -0 € /C belongs to /C* if, and 
only if, it satisfies ([8]). 

Similarly, a vector -0 € V(A) n dom(L) belongs to V*(vl) if, and only if, i> T VL(i/>) = for 
any vector v in the linear space 

{v G M' m : vjv = for all a e A} = span{6; :i6{l,...,m}\i}. 

But this requirement is obviously equivalent to (©. □ 
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Jongbloed for stimulating discussions about shape-constrained estimation. 



17 



Software. The methods of Rufibach (2006, 2007) as well as the active set method from Sec- 
tion |3]are available in the R package "logcondens" by Rufibach and Diimbgen (2009), avail- 
able from "CRAN". Corresponding Matlab code is available from the first author's homepage on 
www . stat . unibe . ch. 
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